Note on sampling without replacing from a finite collection of matrices 
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This technical note supplies an affirmative answer to a question raised in a recent pre-print in the context of a 
"matrix recovery" problem. Assume one samples m Hermitian matrices X\ , . . . , X m with replacement from a 
finite collection. The deviation of the sum Xi + ■ ■ ■ + Xm from its expected value in terms of the operator norm 
can be estimated by an "operator Chemoff-bound" due to Ahlswede and Winter. The question arose whether 
the bounds obtained this way continue to hold if the matrices are sampled without replacement. We remark that 
a positive answer is implied by a classical argument by Hoeffding. Some consequences for the matrix recovery 
problem are sketched. 



This is a technical comment on (TJ]. While we provide a 
(minimal) introduction, readers not familiar with |Q]] may find 
the present note hard to follow. 



A. Motivation 

The low-rank matrix recovery problem ifl UIoll is: Recon- 
struct a low-rank matrix p from m randomly selected matrix 
elements. The more general version introduced in [1] reads: 
Reconstruct p from m randomly selected expansion coeffi- 
cients with respect to any fixed matrix basis. 

Let us consider what seems to be the most mundane aspect 
of the problem: the way in which the m coefficients are "ran- 
domly selected". Assume we are dealing with annxti matrix 
p. The statement of the matrix recovery problem calls for us 
to sample to of the n 2 coefficients characterizing p without 
replacing. This yields a random subset £1 consisting of to of 
the n 2 coefficients, from which the matrix p is then to be re- 
covered. 

Due to the requirement that the drawn coefficients be dis- 
tinct, the m samples are not independent. Their dependency 
turns out to impede the technical analysis of the recovery al- 
gorithms. In order to avoid this complication, most authors 
chose to first analyze a variant where the revealed coefficients 
are drawn independently and then, in a second step, relate the 
modified question to the original one. Two such proxies for 
sampling without replacement have been discussed: 

1. The Bernoulli model d, BI Qli • Here, each of the n 2 co- 
efficients is assumed to be known with probability Thus 
the number of revealed coefficients is itself a random variable 
(with expectation value m). The minor draw-back of this ap- 
proach is that, with finite probability, significantly more than 
77i coefficients will be uncovered. These possible violations of 
the rules of the original problem have to be factored in, when 
the success probability of the algorithm is computed. 

2. The i.i.d. approach QH^§]. The known coefficients are 
obtained by sampling to times with replacement. The draw- 
back here is that, with fairly high probability, some coeffi- 
cients will be selected more than once. To understand why 
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this is undesirable, we need to recall some technical defini- 
tions from QJJ] . 

Let A\ , . . . , A m be random variables taking values in 
[1,T7 2 ]. For now, assume the Ai's are distributed uniformly 
and independently. Let {w a }a=i be an orthonormal Hermi- 
tian basis in the space of n x n-matrices. A central object in 
the analysis is the sampling operator, defined as 

2 m 

Tl-.p^t — y^ tr(pw Ai ) WAi- (1) 

TO ' 
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If the Ai are all distinct, then is a projection opera- 
tor. If, on the other hand, some basis elements occur more 
than once, the spectrum of the sampling operator will be more 
complicated. More importantly, the operator norm 
may become fairly large. The latter effect is undesirable, as 
the logarithm of the operator norm appears as a multiplica- 
tive constant in the final bound on the number of coefficients 
which need to be known in order for the reconstruction pro- 
cess to be successful. 

There seem to be three ways to cope with this problem. 
First, use the worst-case estimate || < m (done in Sec- 

tion II.C of til]). Second, use the fact that the operator norm 
is very likely to be of order 0(log n) (suggested at the end of 
Section II.C in |Q]] and implemented in later versions of |@]). 
Third, prove that the arguments in QJJ] remain valid when the 
A^s are chosen without replacement. Supplying such a proof 
is the purpose of the present note. 

Following earlier work 

SH, Ref. El reduces the analy- 
sis of the matrix recovery problem to the problem of control- 
ling the operator norm of various linear functions of 1Z (c.f. 
Lemma 4 and Lemma 6 of QJJ]). This, in turn, is done by 
employing a large-deviation bound for the sum of indepen- 
dent matrix-valued random variables, which was derived in 
111 111 . Below, we point out that in some situations this bound 
remains valid when the random variables are not independent, 
but represent sampling without replacing. 

B. Statement 

Let C be a finite set. For 1 < m < \C\, let Xi be 
a random variable taking values in C with uniform proba- 
bility. We assume that all the Xi are independent, so that 
X = (Xi, . . . , X m ) is a C m -valued random vector model- 
ing sampling with replacement from C. Likewise, let Y = 
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(Y\, . . . , Y m ) be a random vector of C's sampled uniformly 
without replacement. 

We are mainly interested in the case where C is a finite set 
of Hermitian matrices with some additional properties: We 
assume the set is centered E[Xi] = and that there are con- 
stants c, (To £ R bounding the operator norm \\Xi\\ < c and 
the variance 1 1 IE [JSfT^p] 1 1 < <7q of the random variables. Then: 

Theorem 1 (Operator-Bernstein inequality). With the defini- 
tions above, let Sx = Y];— i Xi and Sy = Yli—i Y{. Let 

V = mag. Then for both S — Sx and S — Sy it holds that 

Pr[||S||>t] <2nexp(~), (2) 
for t < 2V/c, and 

Pr[||S|| >t] <2nexp^-l^ , (3) 

for larger values oft. 

The version involving Sx has been proved in fill a s a minor 
variation of the operator-Chernoff bound from iflltl . In the 
proof, the failure probability is bounded from above in terms 
of the "operator moment-generating function" 

M x (A) = E[trexp(AS x )]. 

To establish the more general statement, it would be sufficient 
to show that My < Mx- In fact, this relation is well-known 
to hold for real-valued random variables. One popular way of 
proving it involves the notion of negative association JT^IHt]. 
Indeed, the author of [ 1 ] tried to generalize this concept to the 
case of matrix-valued random variables, but failed to over- 
come its apparent dependency on the total order of the real 
numbers. However, he overlooked a much older and more el- 
ementary argument given in 11411 . which only relies on certain 
convexity properties and applies without change to the matrix- 
valued case (see below). 

C. Implications 

As a consequence of TheoremQ] the analysis in Section II. C 
of |0]] can be simplified and improved, by setting the constant 
C equal to one. The remark at the end of that section applies. 
In particular, in the rest of that paper, one may assume that 
|| At || 2 < n. 1 / 2 1| Z^.^=; || 2- Thus, the conditions on the certificate 

Y in Section II. E may be relaxed to \\PtY — sgnp||2 < 2 „i/ 2 • 
This implies that I, the number of iterations of the "golfing 
scheme", may be reduced to I = \log 2 (2n 1 / 2 y/r)~\ . The esti- 
mates on | £1 1 in Theorems 1, 2, and 3 therefore all improve by 

a factor of , log2 n ,,„ = 4. 

log 2 n L i-> 

In |9[], Proposition 3.3 becomes superfluous. The final 
bounds improve accordingly. 

The consequences are more pronounced for an upcoming 
detailed analysis lfl5ll of noise resilience (in the spirit of |[l6tl ) 
of quantum mechanical applications. 



The present note makes no statements about approaches 
which either rely on the Bernoulli model, or use the non- 
commutative Kintchine inequality instead of the operator 
Chernoff bound |HH [Toh . 

Finally, note that the "golfing scheme" employed in |Q1 [^] 
demands that I independent batches of coefficients be sam- 
pled. As a consequence of TheoremQ] every single batch may 
be assumed to be drawn without replacement. However, for 
technical reasons, it is still necessary that the batches remain 
independent. This does not constitute a problem. Indeed, let il 
be the set of distinct coefficients used by the golfing scheme. 
It is shown that, with high probability, there exists a "dual 
certificate" in the space spanned by the basis elements corre- 
sponding to the coefficients in 0. Since il is just a random 
subset of cardinality |f2| < to, the probability that there is a 
dual certificate in the space spanned by to distinct random ba- 
sis elements (obtained from sampling without replacing) can 
only be higher. A very similar argument has recently been 
given in 1101 . where the golfing scheme has been modified to 
work with the Bernoulli model. 

D. Proof 

In this section, we repeat an argument from 1 14] which im- 
plies that for all A £ R the inequality M Y (A) < M X (A) 
holds. We emphasize that the proof of [ 14] does not need to 
be modified in order to apply matrix-valued random variables. 
However, the version given below makes some steps explicit 
which were omitted in the original paper. 

For now, let C be any finite set; let X, Y be as above. 

The central observation is that one can generate the dis- 
tribution of X by first sampling y = (yi, . . . ,y m } with- 
out replacement, and then drawing the (xi, . . . ,x m ) from 
{y±, . . . , y m } in a certain (unfortunately not completely triv- 
ial) way. 

To make that second step precise, we introduce a random 
partial function Z from C m to C m . The domain of / is the 
set of vectors y <E C m with pairwise different components 
(Vi 7^ Vj)- Given such a vector y, we sequentially assign 
values to the components Z\,..., Z m of Z(y) by sampling 
from {yi, . . . , y m } according to the following recipe. At the 
kth step, let be the subset of {yi, . . . , y m } of values which 
have already been drawn in a previous step. To get Z^: 

1. with probability take a random element from 
and 

2. with probability 1 — take a random element from 
the {yi, . . . ,y m } not contained in Dj.. 

(Here, by a "random" element, we mean one sampled uni- 
formly at random from the indicated set). Then 

Lemma 2. With the definitions above, X and Z(Y) are iden- 
tically distributed. 

What is more, if C is a subset of a vector space, then 

m m 

E z [5>i(Y)] =J2 Y i- W 

i=l t=l 
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Proof. Choose k £ {1, . . . , m}, let x <= C m . We compute the 
conditional probability 

Pr [Z k {Y)=x k \Z 1 (Y)=x u ...,Z k _ 1 {Y)=x k . 1 ]. 

If there is a j < k such that xj; = Xj, then, according to the 
first rule above, the probability is 



\D,\ 1 



1 



\C\ \D k \ \C\ 
Otherwise, by the second rule, the probability reads 

\ \C\)\C\-\D k \ \C\ 
as well. Iterating: 

Pi[Z 1 (Y)=x 1 ,...,Z m (Y) = x m ] 
= Pr[Zi(Y) =x u ..., Z m _i(Y) = z m -i] 

= Pr[Zi(Y) = a*, . . . , Z m _ 2 (Y) = x m „ 2 ] 



1 
1 



\c\ m 

This proves the first claim. 

We turn to the second statement. The left hand side of (0]i 
is manifestly a linear combination of the random variables Yi. 
From the definition of Z, it is also invariant under any permu- 
tation Yi i — ^ Y^uy As a linear and symmetric function, it is of 



the form KJ^T ^ f° r some constant if. To compute K, we 
use the fact that the Y t are identically distributed, so that 



i=l 



Thus if = 1 and we are done. 



E Z [^^(Y)]] = mE[Yi], 

rn 

Ey^F,] = ifmE[Yi]. 



□ 



Now let / be a convex function on the convex hull of C. 
Using Jensen's inequality and Lemma|2] 
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Finally, specialize to the case where C is a finite set of Her- 
mitian matrices. Since the function c trexp(Ac) is convex 
on the set of Hermitian matrices for all A € R, any upper 
bound on moment generating functions derived for matrix- 
valued sampling with replacing is also valid for sampling 
without replacing. 
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